Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to move the doctests in registerJavaUDAF and registerJavaFunction to the proper unittests that run conditionally when the test classes are present.

Both tests are dependent on the test classes in JVM side, test.org.apache.spark.sql.JavaStringLength and test.org.apache.spark.sql.MyDoubleAvg. So if you run the tests against the plain sbt package, it fails as below:

**********************************************************************
File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction
Failed example:
    spark.udf.registerJavaFunction(
        "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
Exception raised:
    Traceback (most recent call last):
   ...
test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath;
...
   6 of   7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction
   2 of   4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF
***Test Failed*** 8 failures.

Why are the changes needed?

In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html

Does this PR introduce any user-facing change?

No, it's test-only.

How was this patch tested?

Manually tested as below:

./build/sbt -DskipTests -Phive-thriftserver clean package
cd python
./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction"
./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests"
./build/sbt -DskipTests -Phive-thriftserver clean test:package
cd python
./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction"
./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests"

@HyukjinKwon
Copy link
Member Author

Also cc @bersprockets I remember you took a look for this one as well.

@HyukjinKwon
Copy link
Member Author

Also cc @dongjoon-hyun FYI

@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @HyukjinKwon .

df.createOrReplaceTempView("df")
row = self.spark.sql(
"SELECT name, javaUDAF(id) as avg from df group by name order by name desc").first()
self.assertEqual(row.asDict(), Row(name='b', avg=102.0).asDict())
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we don't compare [Row(name=u'b', avg=102.0), Row(name=u'a', avg=102.0)]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could compare them as are. It's just to prevent an issue such as SPARK-29748 or similar issues in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it~ No problem~

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123811 has finished for PR 28795 at commit e7a9974.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you for making this, @HyukjinKwon .
I also verified manually. Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Jun 11, 2020
…egistration to test conditionally

### What changes were proposed in this pull request?

This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present.

Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below:

```
**********************************************************************
File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction
Failed example:
    spark.udf.registerJavaFunction(
        "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
Exception raised:
    Traceback (most recent call last):
   ...
test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath;
...
   6 of   7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction
   2 of   4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF
***Test Failed*** 8 failures.
```

### Why are the changes needed?

In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html

### Does this PR introduce _any_ user-facing change?

No, it's test-only.

### How was this patch tested?

Manually tested as below:

```bash
./build/sbt -DskipTests -Phive-thriftserver clean package
cd python
./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction"
./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests"
```

```bash
./build/sbt -DskipTests -Phive-thriftserver clean test:package
cd python
./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction"
./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests"
```

Closes #28795 from HyukjinKwon/SPARK-31965.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 56264fb)
Signed-off-by: Dongjoon Hyun <[email protected]>
@HyukjinKwon
Copy link
Member Author

Thanks!

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, I always hit this problem ;) Thanks @HyukjinKwon !

holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
…egistration to test conditionally

### What changes were proposed in this pull request?

This PR proposes to move the doctests in `registerJavaUDAF` and `registerJavaFunction` to the proper unittests that run conditionally when the test classes are present.

Both tests are dependent on the test classes in JVM side, `test.org.apache.spark.sql.JavaStringLength` and `test.org.apache.spark.sql.MyDoubleAvg`. So if you run the tests against the plain `sbt package`, it fails as below:

```
**********************************************************************
File "/.../spark/python/pyspark/sql/udf.py", line 366, in pyspark.sql.udf.UDFRegistration.registerJavaFunction
Failed example:
    spark.udf.registerJavaFunction(
        "javaStringLength", "test.org.apache.spark.sql.JavaStringLength", IntegerType())
Exception raised:
    Traceback (most recent call last):
   ...
test.org.apache.spark.sql.JavaStringLength, please make sure it is on the classpath;
...
   6 of   7 in pyspark.sql.udf.UDFRegistration.registerJavaFunction
   2 of   4 in pyspark.sql.udf.UDFRegistration.registerJavaUDAF
***Test Failed*** 8 failures.
```

### Why are the changes needed?

In order to support to run the tests against the plain SBT build. See also https://spark.apache.org/developer-tools.html

### Does this PR introduce _any_ user-facing change?

No, it's test-only.

### How was this patch tested?

Manually tested as below:

```bash
./build/sbt -DskipTests -Phive-thriftserver clean package
cd python
./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction"
./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests"
```

```bash
./build/sbt -DskipTests -Phive-thriftserver clean test:package
cd python
./run-tests --python-executable=python3 --testname="pyspark.sql.udf UserDefinedFunction"
./run-tests --python-executable=python3 --testname="pyspark.sql.tests.test_udf UDFTests"
```

Closes apache#28795 from HyukjinKwon/SPARK-31965.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 56264fb)
Signed-off-by: Dongjoon Hyun <[email protected]>
@HyukjinKwon HyukjinKwon deleted the SPARK-31965 branch July 27, 2020 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants